Preparations

Load the necessary libraries

library(car)       #for regression diagnostics
library(broom)     #for tidy output
library(ggfortify) #for model diagnostics
library(sjPlot)    #for outputs
library(knitr)     #for kable
library(effects)   #for partial effects plots
library(emmeans)   #for estimating marginal means
library(ggeffects) #for partial effects plots
library(modelr)    #for auxillary modelling functions
library(DHARMa)    #for residual diagnostics plots
library(performance) #for residuals diagnostics
library(see)         #for plotting residuals
library(tidyverse) #for data wrangling
theme_set(theme_classic())

Scenario

Polis et al. (1998) were intested in modelling the presence/absence of lizards (Uta sp.) against the perimeter to area ratio of 19 islands in the Gulf of California.

Uta lizard

Format of polis.csv data file

island ratio pa
Bota 15.41 1
Cabeza 5.63 1
Cerraja 25.92 1
Coronadito 15.17 0
.. .. ..
island Categorical listing of the name of the 19 islands used - variable not used in analysis.
ratio Ratio of perimeter to area of the island.
pa Presence (1) or absence (0) of Uta lizards on island.

The aim of the analysis is to investigate the relationship between island parimeter to area ratio and the presence/absence of Uta lizards.

Read in the data

polis = read_csv('../data/polis.csv', trim_ws=TRUE)
## Parsed with column specification:
## cols(
##   ISLAND = col_character(),
##   RATIO = col_double(),
##   PA = col_double()
## )
polis <- janitor::clean_names(polis)
glimpse(polis)
## Rows: 19
## Columns: 3
## $ island <chr> "Bota", "Cabeza", "Cerraja", "Coronadito", "Flecha", "Gemelose…
## $ ratio  <dbl> 15.41, 5.63, 25.92, 15.17, 13.04, 18.85, 30.95, 22.87, 12.01, …
## $ pa     <dbl> 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1
head(polis)
str(polis)
## tibble [19 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ island: chr [1:19] "Bota" "Cabeza" "Cerraja" "Coronadito" ...
##  $ ratio : num [1:19] 15.41 5.63 25.92 15.17 13.04 ...
##  $ pa    : num [1:19] 1 1 1 0 1 0 0 0 0 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ISLAND = col_character(),
##   ..   RATIO = col_double(),
##   ..   PA = col_double()
##   .. )

Exploratory data analysis

Model formula: \[ y_i \sim{} \mathcal{Bin}(n, p_i)\\ ln\left(\frac{p_i}{1-p_i}\right) = \beta_0 + \beta_1 x_i \]

where \(y_i\) represents the \(i\) observed values, \(n\) represents the number of trials (in the case of logistic, this is always 1), \(p_i\) represents the probability of lizards being present in the \(i^{th}\) poluation, and \(\beta_0\) and \(\beta_1\) represent the intercept and slope respectively.

ggplot(polis, aes(y=pa, x=ratio))+
  geom_point()

ggplot(polis, aes(y=pa, x=ratio))+
  geom_point()+
  geom_smooth(method='glm', formula=y~x,
              method.args=list(family='binomial'))

Fit the model

polis.glm <- glm(pa ~ ratio, data = polis,
                family = binomial(link = "logit"))
summary(polis.glm)
## 
## Call:
## glm(formula = pa ~ ratio, family = binomial(link = "logit"), 
##     data = polis)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6067  -0.6382   0.2368   0.4332   2.0986  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   3.6061     1.6953   2.127   0.0334 *
## ratio        -0.2196     0.1005  -2.184   0.0289 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.287  on 18  degrees of freedom
## Residual deviance: 14.221  on 17  degrees of freedom
## AIC: 18.221
## 
## Number of Fisher Scoring iterations: 6

NOTE: making the probability a factor will mess up the emmeans backtransformation later on! Need to fit as numeric!

Model validation

autoplot(polis.glm, which = 1:6, label.repel = T)
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Residual plots impossible to distinguish any differences. However, QQ plot looks ok, so seems to conform well to a binomial distribution. Cook’s d is showing the third point to have high influence and leverage.

DHARMa::simulateResiduals(polis.glm, plot = T)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.8121925 0.8900391 0.9791159 0.3742036 0.985986 0.3423198 0.7653927 0.4519977 0.01965412 0.6090317 0.2424086 0.5740225 0.1126201 0.7365416 0.00219597 0.0698516 0.7393137 0.9747781 0.3383206

We can clearly see in the DHARMa package’s right plot that there are no issues with the residuals here.

Partial plots

Three different ways to plot the predicted effect:

plot_model(polis.glm, type = "eff", show.data = T)
## $ratio

plot(allEffects(polis.glm, residuals = T), type = "response")

polis.glm %>% ggemmeans(~ ratio) %>% plot(add.data=T)

Model investigation / hypothesis testing

summary(polis.glm)
## 
## Call:
## glm(formula = pa ~ ratio, family = binomial(link = "logit"), 
##     data = polis)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6067  -0.6382   0.2368   0.4332   2.0986  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   3.6061     1.6953   2.127   0.0334 *
## ratio        -0.2196     0.1005  -2.184   0.0289 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 26.287  on 18  degrees of freedom
## Residual deviance: 14.221  on 17  degrees of freedom
## AIC: 18.221
## 
## Number of Fisher Scoring iterations: 6

There seems to be a negative relationship, whereas the ratio increases, the chance of the lizard occurring decreases!

To get the relationship, use exponential backtransform for log-link:

exp(coef(polis.glm)[1])
## (Intercept) 
##    36.82103

Lizards are 36 times as likely to be present rather than absent given the x-var/ ratio = 0.

exp(coef(polis.glm)[2])
##     ratio 
## 0.8028734

For every 1 unit change in ratio, the odds of a lizard occurring are 0.80 of what they were previously (declines by 20% per unit).

Predictions

tidy(polis.glm, conf.int = T)

LD50 = -intercept/slope

coef(polis.glm) %>% {-.[1]/.[2]} %>% as.numeric
## [1] 16.4242

Get Rasenbush pseudo-R^2:

1 - (polis.glm$deviance / polis.glm$null.deviance)
## [1] 0.4590197

Get another proposed pseudo-R^2, which is a more recent and more well accepted pseudo-R^2:

MuMIn::r.squaredLR(polis.glm)
## [1] 0.4700986
## attr(,"adj.r.squared")
## [1] 0.6273785

Summary figures

polis_grid <- polis %>% data_grid(ratio = seq_range(ratio, n=100))

newdata <- emmeans(polis.glm, ~ ratio, 
                   at = polis_grid, 
                   type = "response") %>% as.data.frame

newdata %>%
  ggplot(aes( x=ratio)) +
  geom_ribbon(aes(ymin=asymp.LCL, ymax=asymp.UCL),
              fill='blue', alpha=0.2) + 
  geom_line(aes(y = prob)) +
  geom_point(data = polis, aes(y = pa))

References

Polis, G. A., S. D. Hurd, C. D. Jackson, and F. Sanchez-Piñero. 1998. “Multifactor Population Limitation: Variable Spatial and Temporal Control of Spiders on Gulf of California Islands.” Ecology 79: 490–502.